Model Evaluation by Comparison of Model-Based Predictions and Measured Values

نویسنده

  • Hugh G. Gauch
چکیده

dictive accuracy of a model, even when such is the researchers’ explicit objective. This confusion persists. For The appropriateness of a statistical analysis for evaluating a model instance, see the 10 papers from a symposium on “Crop depends on the model’s purpose. A common purpose for models in agricultural research and environmental management is accurate Modeling and Genomics” published recently in this jourprediction. In this context, correlation and linear regression are frenal (Agronomy Journal 95:4–113). That symposium ilquently used to test or compare models, including tests of intercept lustrates the frequent use of correlation and regression a 0 and slope b 1, but unfortunately such results are related only for model evaluation. obliquely to the specific matter of predictive success. The mean However, Kobayashi and Salam (2000) present cosquared deviation (MSD) between model predictions X and measured gent reasons why the correlation coefficient and linear values Y has been proposed as a directly relevant measure of predictive regression are not entirely satisfactory for model evaluasuccess, with MSD partitioned into three components to gain further tion and suggest that MSD and its components are often insight into model performance. This paper proposes a different and more informative. Further developing those findings, a better partitioning of MSD: squared bias (SB), nonunity slope (NU), different partitioning of MSD components has the adand lack of correlation (LC). These MSD components are distinct and additive, they have straightforward geometric and analysis of vantage of yielding distinct components with straightvariance (ANOVA) interpretations, and they relate transparently to forward meanings. regression parameters. Our MSD components are illustrated using several models for wheat (Triticum aestivum L.) yield. The MSD statistic and its components nicely complement correlation and linear COMPONENTS OF regression in evaluating the predictive accuracy of models. MEAN SQUARED DEVIATION Model-based and measured values, X and Y, can be compared for the purpose of evaluating a simulation C of model-based and measured values model. An important statistic for this purpose is the arise frequently in agricultural research. For inmean square deviation between X and Y. The MSD is stance, a simulation model predicting yield from weasimply the sum of squared deviations between X and ther, soil, physiological, and morphological data can be Y, divided by the number of observations, N, where compared with actual yield measurements to assess the summation is over n 1, 2, ..., N, model’s accuracy and merit. Similarly, after a model has been developed using one data set, model outputs may MSD (Xn Yn)/N [1] be compared with a different data set used for validaLet X and Y be the means. Also, let xn Xn X tion. Typical purposes for such comparisons are to assess and yn Yn Y be the deviations from the means. a model’s predictive accuracy, to inform preferences The partitioning of MSD suggested by Kobayashi and among several competing models, to inform choices Salam (2000) has three components [also see Xie et al. among various possible measurements serving as model (2001) and Ewert et al. (2002)]. Using their notations, inputs, to define the range of conditions over which a the first component is SB, which arises from these two model is applicable or reliable, and to characterize the means being unequal, specific kinds of departures between model-based and measured values as a prelude to identifying specific posSB (X Y )2 [2] sibilities for model refinement. The (population) standard deviation of the simulation Here the convention is to use model outputs as preSDs is ( xn/N), and likewise the measurements have dictors for actual measurements, symbolized by X and Y, an SDm of ( yn/N). Accordingly, their second comporespectively. One simple statistic for assessing a model’s nent, the difference in the magnitudes of fluctuation merit is the correlation coefficient (r) between X and between the measurements and simulations (SDSD), Y. Another common analysis is linear regression of Y arises from these two standard deviations being unon X to check whether the intercept (a) is near 0 and equal, the slope (b) is near 1. Wallach and Goffinet (1989) observed that agricultural and ecological researchers use SDSD (SDs SDm) [3] diverse statistical analyses for model evaluation, but many of these analyses fail to quantify directly the preThird and finally, there is lack of positive correlation weighted by the standard deviations (LCS). H.G. Gauch and G.W. Fick, Crop and Soil Sciences, Cornell Univ., Ithaca, NY 14853; J.T.G. Hwang, Dep. of Mathematics and Dep. of Abbreviations: ANOVA, analysis of variance; LC, lack of correlation; Statistical Science, Cornell Univ., Ithaca, NY 14853. Received 12 Feb. LCS, lack of positive correlation weighted by the standard deviations 2002. *Corresponding author ([email protected]). of the measurements and simulations; MSD, mean squared deviation; MSEP, mean squared error of prediction; NU, nonunity slope; SB, Published in Agron. J. 95:1442–1446 (2003).  American Society of Agronomy squared bias; SDSD, difference in the magnitudes of fluctuation between the measurements and simulations. 677 S. Segoe Rd., Madison, WI 53711 USA

برای دانلود رایگان متن کامل این مقاله و بیش از 32 میلیون مقاله دیگر ابتدا ثبت نام کنید

ثبت نام

اگر عضو سایت هستید لطفا وارد حساب کاربری خود شوید

منابع مشابه

Evaluation of Drainmod model in Irrigation and Drainage Network of ABADAN Date palms

In the study, Drainmod model was used to compare measured water table depth fluctuations on the farm with those simulated by the model, also comparison of measured drainage outflow of the farm drains and model simulated was investigated. This study was made in a part of the Irrigation and Drainage Network of ABADAN Date palms. By statistical analysis of the measured and predicted water table de...

متن کامل

Comparison of Linear and Threshold Models for Estimation Genetic and Phenotypic Parameters of Success of Conception at First Service and Inseminations to Conception in Holstein Cattles in East Azarbayjan Province

In this research genetic and phenotypic parameters were estimated using linear and threshold models, for reproductive traits, data from 6 large industrial dairy herd of East Azerbaijan province collected by Agriculture Jihad Organization during 10 years (2001-2010). Best linear unbiased predictions of traits breeding values were estimated using Restricted Maximum Likelihood method by WOMBAT sof...

متن کامل

Comparison of Linear and Threshold Models for Estimation Genetic and Phenotypic Parameters of Success of Conception at First Service and Inseminations to Conception in Holstein Cattles in East Azarbayjan Province

In this research genetic and phenotypic parameters were estimated using linear and threshold models, for reproductive traits, data from 6 large industrial dairy herd of East Azerbaijan province collected by Agriculture Jihad Organization during 10 years (2001-2010). Best linear unbiased predictions of traits breeding values were estimated using Restricted Maximum Likelihood method by WOMBAT sof...

متن کامل

Correlations and Predictions of THF + 2-Alkanol Binary Mixtures Behaviour by PC-SAFT Model and Friction Theory

In this article the behavior of tetrahydrofuran (THF) + 2-alkanol namely 2-propanol, 2-butanol, 2-pentanol, 2-hexanol and 2-heptanol binary mixtures through the density and viscosity measurements have been studied as a function of composition and within the temperature range of 293.15–313.15 K. The excess molar volume, isobaric thermal expansivity, partial molar volumes, and viscosity deviation...

متن کامل

Introducing hard rock TBMs’ downtime analysis model with reference to past case histories’ data

The study of downtime and subsequently machine utilization in a given project is one of the major requirements of an accurate estimation of TBM performance and daily advance rate. Interestingly, while it is very common to report the components of downtime when discussing a tunneling project in the literature; there has not been a great amount of in-depth studies on this topic in the recent year...

متن کامل

Comparison of Different Model Predictions on RBE in the Proton Therapy Technique Using the GATE Code

Recently, proton therapy is used as one of the effective methods for treating various types of cancer in clinical treatment. An appropriate formalism to obtain relative biological effectiveness values for treatment planning studies is needed in this hadrontherapy technique. Hereby, the quantity of biological dose, instead of using the physical doses, is introduced to evaluate the biological eff...

متن کامل

ذخیره در منابع من


  با ذخیره ی این منبع در منابع من، دسترسی به آن را برای استفاده های بعدی آسان تر کنید

عنوان ژورنال:

دوره   شماره 

صفحات  -

تاریخ انتشار 2003